CLAIMS 

What is claimed is: 

1. A method for transforming a hypermedia document containing 
main content and auxiliary data, the method comprising: 

converting the hypermedia document into a string containing a 
plurality of first values and a plurality of second values, the plurality of 
first values corresponding to a plurality of formatting code segments 
within the hypermedia document and the plurality of second values 
corresponding to a plurality of text segments within the hypermedia 
document; 

applying a low-pass filter to the string containing the plurality of 
first values and the plurality of second values; and 

determining location of the main content within the hypermedia 
document using an output of the low-pass filter. 

2. The method of claim 1 further comprising: 

coding the main content in a mobile device language for display on 
a mobile device. 

3. The method of claim 1, wherein the hypermedia document is a file 
written in any one of a hypertext markup language (HTML), a dynamic 
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3 HTML, an extensible HTML (XHTML), an extensible markup language 

4 (XML), JavaScript, and Visual Basic (VB) script. 

1 4. The method of claim 1, wherein converting the hypermedia 

2 document further comprises: 

3 parsing the hypermedia document to identify the plurality of 

4 formatting code segments and the plurality of text segments within the 

5 hypermedia document; 

6 assigning a first value to each character within the plurality of 

7 formatting code segments; and 

8 assigning a second value to each character within the plurality of 

9 text segments. 

1 5. The method of claim 4 further comprising truncating a length of 

2 one of the plurality of formatting code segments when the length of said 

3 one of the plurality of formatting code segments exceeds a threshold tag 

4 length value. 

1 6. The method of claim 1, wherein each of the plurality of first values 

2 is equal to zero. 
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1 7. The method of claim 1, wherein each of the plurality of second 

2 values is equal to one. 

1 8. The method of claim 1, wherein the low-pass filter is a moving 

2 average filter. 

1 9. The method of claim 8, wherein the output of the low-pass filter 

2 represents a distribution of text density over the hypermedia document. 

1 10. The method of claim 9, wherein determining the location of the 

2 main content further comprises: 

3 searching an output of the low-pass filter to find a position of a 

4 central peak corresponding to the highest text density within the 

5 hypermedia document; and 

6 determining a starting position of a high text density area and an 

7 ending position of the high text density area using the position of the 

8 central peak and a threshold text density value. 

1 11. The method of claim 10, wherein the threshold text density value is 

2 determined empirically. 

1 12. The method of claim 1 further comprising: 
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2 varying the second value for one of the plurality of text segments 

3 based upon a weight associated with said one of the plurality of text 

4 segments. 

1 13. The method of claim 1, wherein applying the low-pass filter further 

2 comprises: 

3 applying a median filter to the string containing the plurality of 

4 first values and the plurality of second values to suppress high frequency 

5 signal oscillations associated with the string; and 

6 applying a moving average filter to an output of the median filter 

7 to combine a plurality of closely spaced text segments contained in the 

8 output of the median filter into a set of larger text segments. 

1 14. The method of claim 13, wherein determining the location of the 

2 main content further comprises: 

3 applying a rising and falling edge detector to an output of the 

4 median filter to identify the largest reasonably contiguous text segment 

5 within the set of larger segments. 

1 15. The method of claim 14, wherein the largest reasonably contiguous 

2 text segment is identified using a threshold text value. 
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1 16. An apparatus for transforming a hypermedia document containing 

2 main content and auxiliary data, the apparatus comprising: 

3 a converter to convert the hypermedia document into a string 

4 containing a plurality of first values and a plurality of second values, the 

5 plurality of first values corresponding to a plurality of formatting code 

6 segments within the hypermedia document and the plurality of second 

7 values corresponding to a plurality of text segments within the 

8 hypermedia document; 

9 a low-pass filter to apply to the string containing the plurality of 

10 first values and the plurality of second values; and 

11 a location calculator to determine location of the main content 

12 within the hypermedia document using an output of the low-pass filter. 

1 17. The apparatus of claim 16 further comprising: 

2 an encoder to code the main content in a mobile device language 

3 for display on a mobile device. 

1 18. The apparatus of claim 16, wherein the hypermedia document is a 

2 file written in any one of a hypertext markup language (HTML), a 

3 dynamic HTML, an extensible HTML (XHTML), an extensible markup 

4 language (XML), JavaScript, and Visual Basic (VB) script. 
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1 19. The apparatus of claim 16 further comprising a parser to identify 

2 the plurality of formatting code segments and the plurality of text 

3 segments within the hypermedia document. 

1 20. The apparatus of claim 16 wherein the converter is to convert the 

2 hypermedia document by assigning a first value to each character within 

3 the plurality of formatting code segments and assigning a second value to 

4 each character within the plurality of text segments. 

1 21. The apparatus of claim 20 wherein the converter is to truncate a 

2 length of one of the plurality of formatting code segments when the length 

3 of said one of the plurality of formatting code segments exceeds a 

4 threshold tag length value. 

1 22. The apparatus of claim 16, wherein each of the plurality of first 

2 values is equal to zero. 

1 23. The apparatus of claim 16, wherein each of the plurality of second 

2 values is equal to one. 

1 24. The apparatus of claim 16, wherein the low-pass filter is a moving 

2 average filter. 
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1 25. The apparatus of claim 24, wherein the output of the low-pass filter 

2 represents a distribution of text density over the hypermedia document. 

1 26. The apparatus of claim 25, wherein the location calculator is to 

2 determine the location of the main content by searching an output of the 

3 low-pass filter to find a position of a central peak corresponding to the 

4 highest text density within the hypermedia document, and by 

5 determining a starting position of a high text density area and an ending 

6 position of the high text density area using the position of the central peak 

7 and a threshold text density value. 

1 27. The apparatus of claim 1 wherein the converter is to vary the 

2 second value for one of the plurality of text segments based upon a weight 

3 associated with said one of the plurality of text segments. 

1 28. The apparatus of claim 16, wherein the low-pass filter further 

2 comprises: 

3 a median filter to be applied to the string containing the plurality of 

4 first values and the plurality of second values to suppress high frequency 

5 signal oscillations associated with the string; and 
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6 a moving average filter to be applied to an output of the median 

7 filter to combine a plurality of closely spaced text segments contained in 

8 the output of the median filter into a set of larger text segments. 

1 29. The apparatus of claim 28, wherein the location calculator is to 

2 determine the location of the main content by applying a rising and falling 

3 edge detector to an output of the median filter to identify the largest 

4 reasonably contiguous text segment within the set of larger segments. 

1 30. The apparatus of claim 29, wherein the location calculator is to 

2 identify the largest reasonably contiguous text segment using a threshold 

3 text value. 

1 31. A medium readable by a machine, the medium having stored 

2 thereon a sequence of instructions which, when executed by the machine,, 

3 cause the machine to: 

4 convert the hypermedia document into a string containing a 

5 plurality of first values and a plurality of second values, the plurality of 

6 first values corresponding to a plurality of formatting code segments 

7 within the hypermedia document and the plurality of second values 

8 corresponding to a plurality of text segments within the hypermedia 

9 document; 
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10 apply a low-pass filter to the string containing the plurality of first 

11 values and the plurality of second values; and 

12 determine location of the main content within the hypermedia 

13 document using a low-pass filter output. 



1 32. A method for transforming a web page containing main content 

2 and auxiliary data, the method comprising: 

3 converting the web page into a string containing a plurality of first 

4 values and a plurality of second values, the plurality of first values 

5 corresponding to a plurality of formatting code segments within the web 

6 page and the plurality of second values corresponding to a plurality of 

7 text segments within the web page; 



8 applying a moving average filter to the string containing the 

9 plurality of first values and the plurality of second values to generate an 

10 output representing a distribution of text density over the web page; 

11 searching the output of the moving average filter to find a position 

12 of a central peak corresponding to the highest text density within the web 

13 page; 

14 determining a starting position of a high text density area and an 

15 ending position of the high text density area using the position of the 

16 central peak and a threshold text density value to determine location of 

17 the main content within the web page; and 
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18 coding the main content in a mobile device language for display on 

19 a mobile device. 

1 33. The method of claim 32 further comprising truncating a length of 

2 one of the plurality of formatting code segments when the length of said 

3 one of the plurality of formatting code segments exceeds a threshold tag 

4 length value. 

1 34. The method of claim 32, wherein each of the plurality of first values 

2 is equal to zero and each of the plurality of second values is equal to one. 

1 35. The method of claim 32 further comprising: 

2 varying the second value for one of the plurality of text segments 

3 based upon a weight associated with said one of the plurality of text 

4 segments. 

1 36. A method for transforming a web page containing main content 

2 and auxiliary data, the method comprising: 

3 converting the web page into a string containing a plurality of first 

4 values and a plurality of second values, the plurality of first values 

5 corresponding to a plurality of formatting code segments within the web 
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6 page and the plurality of second values corresponding to a plurality of 

7 text segments within the web page; 

8 applying a median filter to the string containing the plurality of 

9 first values and the plurality of second values to suppress high frequency 

10 signal oscillations associated with the string; 

11 applying a moving average filter to an output of the median filter 

12 to combine a plurality of closely spaced text segments contained in the 

13 output of the median filter into a set of larger text segments; 

14 applying a rising and falling edge detector to an output of the 

15 median filter to identify the largest reasonably contiguous text segment 

16 within the set of larger segments using a threshold text value, the largest 

17 reasonably contiguous text segment corresponding to the main content of 

18 the web page; and 

19 coding the main content in a mobile device language for display on 

20 a mobile device. 

1 37. The method of claim 36 further comprising truncating a length of 

2 one of the plurality of formatting code segments when the length of said 

3 one of the plurality of formatting code segments exceeds a threshold tag 

4 length value. 
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1 38. The method of claim 36, wherein each of the plurality of first values 

2 is equal to zero and each of the plurality of second values is equal to one. 

1 39. The method of claim 36 further comprising: 

2 varying the second value for one of the plurality of text segments 

3 based upon a weight associated with said one of the plurality of text 

4 segments. 
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